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This  paper  attempts  to  provide  an  Introduction 
for  statisticians  to  the  version  of  optimal  experimental 
design  theory  for  parameter  estimation  In  regression 
models  that  Is  appropriate  to  dynamic  systems.  The 
paper  consists  of  three  main  parts:  first,  a glossary 
of  some  terminology  In  control  engineering  and  an  Intro- 
duction to  the  main  aspects  of  dynamic  systems;  second, 
a summary  of  the  principal  results  and  patterns  In 
optimal  experimental  design  theory;  and  third,  the  ways 
In  which  the  latter  carry  over  to  dynamic  models.  These 
applications  are  split  roughly  Into  those  involving 
choice  of  Input  functions  and  those  In  which  sampling 
times  are  selected. 
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1.  INTRODUCTION 


The  main  objective  of  this  work  Is  to  bring  to  a statistical 
readership  the  recent  activity  among  control  engineers  In  the  field 
of  experimental  design.  Principally,  we  mean  to  catalogue  work  that 
parallels  so-called  optimal  experimental  design  In  which  Important 
references  are  Kiefer  (1959,  1974),  Kiefer  and  Wolfowltz  (1960), 
Fedorov  (1972),  Whittle  (1973),  and  Sllvey  (1974).  It  Is  to  be  hoped 
that  statisticians  will  become  more  familiar  with  and  more  Interested 
In  dynamic  problems,  a hope  that  has  been  expressed  before  In  Wlshart 
(1969),  Young  (1975),  Wynn  (1974),  and  Harrison  and  Stevens  (1976). 

In  the  engineering  literature  there  are  useful  surveys  by  Mehra 
(1974b)  and  Goodwin  and  Payne  (1977),  but  It  Is  hoped  that  a "trans- 
lation" might  be  helpful  to  a statistical  audience. 

The  dictionary  for  the  translation  is  provided  In  Section  2.  In 
Section  3 the  main  features  of  "static"  optimal  design  theory  are  laid 
out  and  In  Sections  4 and  5 generalizations  of  these  to  dynamic  sys- 
tems are  described.  Section  6 contains  a brief  conclusion. 

2.  SOME  TERMINOLOGY- 

* 

A major  discouragement  to  statisticians  who  approach  the  engin- 
eering literature  Is  the  "wealth"  of  jargon.  There  are  both  new  con- 
cepts and  alternative  terms  for  familiar  Ideas.  Many  of  these  are 
discussed  at  length  In  the  survey  by  Wlshart  (1969)  of  the  determin- 
istic optimal  control  problem,  and  here  we  give  but  a brief  Introduc- 
tion to  the  new  language,  with  special  regard  to  the  problems  related 


to  experimental  designs. 

Since  "time"  Is  an  essential  feature  of  dynamic  systems,  we  will 


2. 


be  concerned  with  stochastic  processes,  which  may  be  described  in 
discrete-  or  continuous-time  and  which  may  be  uni-  or  multi-variate, 
stationary  or  nonstationary. 

The  system  itself,  which  the  statistician  would  be  more  likely 
to  call  the  model , generally  involves  processes  of  three  types: 

Inputs . outputs , and  noise.  (Me  shall  see  later  that  a fourth  cate- 
gory, the  state , Is  often  used,  but  it  arises  less  directly  and  we 
delay  its  description  for  the  time  being.)  The  Inputs , or  controls, 
are  generally  open  to  choice,  the  outputs  may  be  observed  by  the  ex- 
perimenter, and  the  noi se  Is  random  disturbance,  which  may  be  obser- 
vation error  or  a contribution  to  the  dynamic  evolution  of  the  pro- 
cess. 

A further  component  of  the  system  is  a set  of  parameters , con- 
ceptually familiar  to  the  statistician. 

As  an  exercise  In  the  terminology,  let  us  consider  the  following 
slmpl e model . 

y(t)  - ax(t)y(t-l) - a2(t)y(t-2)  = bj(t)u(t)  ♦ e(t) . t-1,2,...  (1) 

with  some  Initial  conditions  such  as  y(0)  * y(-l)  ■ 0.  (y(t)>  are 
the  outputs,  (u(t)}  are  the  Inputs,  <e(t)}  the  noise  and  the  para- 
meters are  (a^t),  a2(t),  b^t)),  along  with  the  statistical  descrip- 
tion of  the  noise  process,  which  almost  always  has  zero  means.  In 
most  discrete-time  problems  the  noise  Is  assumed  to  be  normally  dis- 
tributed (Gaussian  to  the  engineers)  and  If  the  (e(t)}  are  uncorrela- 
ted and  Identically  distributed,  the  noise  Is  said  to  be  white  because 
of  Its  consequently  flat  spectral  density.  Often 

aj(t)  * aj  , for  all  t, 

and  similarly  for  the  (a2(t),  bj(t)>.  The  system  is  then  called 
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time-invariant,  as  opposed  to  time-varying  and  (1)  becomes 

y(t)  - ajy(t-l)  - a2y(t-2)  ■ bjU(t)  + e(t)  , t « 1,2,...  . (2) 

Let  us  suppose  for  the  moment  that  the  Input  and  output  processes 
are  scalars.  Then  the  control  engineers  would  describe  (2)  as  the 
Input-output  representation  of  a 1 Inear,  time-invariant,  discrete- time. 
stochastl c,  single-input-single-output  system.  The  antonyms  of  all 
the  adjectives  are  obvious.  By  "1  inear11  Is  meant  linearity  In  the 
processes , not  the  parameters,  although,  apart  from  the  parameters  In 
the  noise  process,  we  do  have  this  sort  of  linearity  as  well.  If  the 
{ e ( t ) } are  normally  Identically  and  independently  distributed  with 
zero  means,  we  might  augment  the  description  by  adding  that  the  system 
is  "driven  by  white  Gaussian  noise." 

Of  course,  for  the  above  example  the  familiar  time-series  lang- 
uage of  Box  and  Jenkins  (1976)  Is  also  used,  and  the  concept  of 
stationarl ty  Is  also  of  concern  to  the  engineers. 

The  recursive  nature  of  (2)  leads  to  the  possibility  of  con- 
structing a generating  function  version.  Thus  If  we  denote  by 

. Y(z)  « I z*y(1 ) » 
i-0 

the  z-transform  of  the  output  process,  and  so  on,  and  If  we  take 
c'O)  = 0,  (2)  can  be  written 


where 


and 


A(z)  Y (z ) = B(z)U(z)  + E(z)  , 
A ( z ) * aQ  - ajZ  - a£z2 


1 


B(z)  * b 
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Going  a stage  further,  we  have.  If  A(z)~*  exists, 

Y(z)  - Hj(z)U(z)  + H2(z)E(z)  . 

where  Hj(z)  ■ A ( z ) “ 1 B (z)  and  H2(z)  * A(z)”1  are  called  the  trans- 
fer functions  from  the  Input  and  the  noise,  respectl vely , to  the  out- 
put. For  more  discussion  of  this  formulation,  see  Wlshart  (1969)  and 
Cadzow  (1973). 

A further  Important  concept  is  that  of  state-space  models,  which 
revolutionized  control  theory  methodology.  The  principal  objective, 
In  discrete-time  systems,  is  to  write  the  model  description  as  a set 
of  first-order  recursions  on  the  so-called  state  variable(s),  coupled 
with  an  equation  relating  the  observation  or  output  at  time  t with 
the  state  varlable(s)  and  Input  varlable(s)  at  time  t.  We  should 
therefore  have,  for  a linear  time-invariant  system,  a model  of  the 
form 

x(t+l)  * Gx(t)  + Hu(t)  + Fe(t) 

y(t)  a Bx(t)  + Cu(t)  + Dn(t)  , t-1,2,... 

where  x(t)  denotes  the  vector  of  state  variables  at  time  t and 
{e(t)>  and  {n(t)>  are  noise  processes. 

Systems  can  have  both  an  Input-output  and  a state-space  repre- 
sentation. For  (2),  if  we  define  two  state  variables  In 

Xj ( t ) * y(t-l) 
x2(t)  * y(t-2)  , 

then  we  can  replace  (2)  by  (3)  with 


D * (1)  and  n(t)  * e(t),  t * 1,2,...,  along  with  the  Initial 


i 


r 
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condition  x.(l)  ■ 0_. 

The  state-space  representation  Is  Important  for  various  reasons. 
It  is  easy  to  develop  the  model  sequentially  In  the  time-domain . 

Using  input-output  models,  the  time-domain  approach  Is  difficult  and 
frequency-domain  or  z-transform  methods  depend  on  statlonarlty  of  the- 
system,  which  Is  not  necessary  for  the  state-space  analysis  of  the 
operation  of  the  system  over  a finite  period  of  time.  Also,  nonlinear 
problems  can  be  dealt  with  to  some  extent  by  linearization.  Finally, 
and  very  importantly  from  a statistical  point  of  view,  if  the  noise- 
processes  are  Gaussian  and  white  (by  preliminary  transformation  or 
pre-whitening  if  necessary)  and  if  the  distribution  of  x.(l)  is 
Gaussian,  then  the  posterior  distribution  of  x(t),  given 
y ( 1 ) ,. . . ,y(t-l) , is  also  Gaussian,  with  mean  x(t)  and  covariance 


matri x 

P(t),  say. 

What  is  more,  these  parameters 

can 

be 

computed 

recursively 

from 

x(t+l) 

=•  Gx(t)  + Hu(t)  + K(t)v(t) 

(4) 

y(t) 

= Bx^(t)  + Cu(t)  + E(t)v(t) 

(5) 

p(t+i) 

* GP (t)G  - K( t ) KT( t ) + FQFT 

(6) 

where 

K(t) 

, called 

• 

1 the  gain,  is  defined  by 

K(t) 

» GP  ( t ) BT  (BP  ( t ) BT  + R)'1/2  , 

(7) 

and 

Kt) 

■ (BP  ( t ) DT  + R)1/2  . 

R 

* cov{n(t)}  and  Q * cov{e(t)l 

• 

Often 

(4)  and 

(5)  are  combined  by  elimination 

of 

the 

so-called 

innovation 

process 

v(t)  to  give  the  updated  £(t+l) 

directly  in 

terms  of  x_( t ) , u(t)  and  y(t).  The  innovation  process  can  be 
shown  to  be  a sequence  of  uncorrelated  "standardized"  normal  random 
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variables  or  vectors;  see  Kailath  (1968),  Frost  (1968). 

Equations  (4),  (5)  and  (6)  form  the  Kalman  (-Bucy)  filter 
(Kalman  and  Bucy,  1961),  which  Is  reappearing  more  and  more  In  the 
statistical  literature  (Harrison  and  Stevens,  1976;  O'Hagan,  1978). 

In  the  above  they  arise  very  naturally  through  Bayes  Theorem  and 
they  have  been  derived  In  several  other  ways:  as  a recursive  least 
squares  algorithm,  by  maximum  likelihood,  using  projection  arguments, 
and  by  the  so-called  innovations  approach.  Stochastic  approximation 
methods  have  also  been  used  In  the  theory  of  the  Kalman  filter.  The 
book  by  Jazwinskll  (1970,  Chapter  7)  illustrates  some  of  the  ap- 
proaches; see  also  Willems  (1978). 

If  the  system  is  stationary  and  the  "steady-state"  equilibrium 
has  been  reached,  representations  (4)  and  (5)  have  the  interesting 
feature  that,  because  the  noise  process  { v ( t ) } Is  the  same  In  both, 
they  can  be  replaced  by  an  Input-output  model  described  by  the 
z-transform  equation 

Y(z)  « {B(z*1I-G)*1H  + C } U ( z ) + {3(z"1I-G)"1K  + Z)v(z)  . (8) 

The  steady-state  gain  K is  computed  by  suppressing  the  time 
arguments  in  (6)  and  (7)  and  eliminating  the  resulting  P. 

Perhaps  this  is  the  point  to  mention  that  linear  systems  theory 
and  Kalman  filtering  methods  extend  to  more  general  situations.  Con- 
tinuous time  processes  are  an  obvious  example,  for  which  the  equa- 
tion governing  the  change  of  state  will  be  a first  order  differen- 
tial equation.  An  even  more  general  set-up  for  which  the  Kalman  the- 
ory carries  over  Is  that  involving  so-called  distributed-parameter 
systems.  Here  the  state  Is  a function  x.(t,w)  of  two  variables, 
where  u can  take  uncountably  many  values.  In  an  application  It 


might  represent  spatial  variables.  In  continuous  time  the  model 
might  take  the  form 


rf  x(t,w)  = L x(t ,u)  + H(u)u(t,u»)  + noise  , 

Jl  “ w — 

where  L Is  a linear  operator  involving  w-derl vatl ves . In  con- 
u 

trast,  the  models  we  have  considered  so  far  are  so-called  1 umped- 
parameter  systems.  They  correspond  to  u taking  only  countably  or 
finitely  many  values,  so  that  a separate  set  of  state  variables  can 
be  defined  for  each  u-value.  O'Hagan  (1978)  has  used  models  similar 
to  distributed-parameter  systems;  a bibliography  of  the  field  is 
available  in  Polis  and  Goodson  (1976). 

We  return  to  the  lumped-parameter  case.  The  value  of  x(s)  is 
a natural  point  estimate  of  x.(s)»  allowing  fulfillment  of  the  acti- 
vity known  as  state-estimation  or  state-identification.  If  we  have 

available  the  observations  y ( 1 ) y(t-l) , then,  according  as 

s < t,  s * t,  or  s > t,  the  problem  Is  one  of  smoothing,  fl 1 terinq , 
or  predi cti on . 

Although  we  shall  briefly  mention  the  problem  of  state-estimation, 
we  shall  concentrate  more  on  that  of  parameter-estimation  or  system- 
1 dentl fl cation . Our  design  problem  is  to  select  suitable  Inputs  over 
a specified  period  of  time,  possibly  infinite,  to  estimate  the  para- 
meters "as  well  as  possible";  optimal  input  signal  synthesis.  As  in 
optimal  experimental  design,  some  criterion  of  efficiency  will  be  pro- 
posed, and  there  generally  will  be  some  constraints  on  the  allowable 
Inputs.  Our  problem  is  different  from  that  of  optimal  control.  There 
the  inputs  must  be  chosen  to  keep  the  state  vector  as  close  as  possible 
to  some  trajectory,  or  to  home  In  on  some  target. 
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The  extra  dimension  of  time  has  major  consequences  as  far  as 
optimal  design  Is  concerned.  We  shal1  find  that  the  attractive 
linear  regression  theory  Is  rarely  applicable*  but  we  shall  have  the 
possibility  of  sequential  design  open  to  us.  A nonsequential  design, 
In  which  the  input  strategy  is  specified  before  the  start,  could  be 
used,  corresponding  to  off-line  operation,  but  it  will  seem  better 
to  choose  inputs  as  we  go  along,  on-1 ine,  operating  an  adaptive 
system. 

Two  final  concepts  should  be  mentioned,  control  lability  and 
observabi 1 i ty , which  are  closely  linked  with  the  i denti f i abi 1 i ty  of 
parameters.  Consider  the  deterministic  model 


i 


x(t+l)  * Gx ( t ) + Hu(t) 
y(t)  * Bx(t)  . 

This  system  is  completely  observable  if,  after  sufficient  obser- 
vations, the  initial  state  x.(l)  can  be  exactly  determined.  It  is 
completely  controllable  if,  after  sufficient  stages,  or  choices  of 
inputs,  it  is  possible  to  translate  the  state  to  any  specified  posi- 
tion. These  concepts  are  important  if  u(*)  and  y(*)  are  vectors, 
and  elegant  equivalent  criteria  exist  in  terms  of  the  matrices  G, 

H and  B;  see  Wishart  (1969). 

From  our  point  of  view,  the  interesting  point  is  that  if  system 
(3)  is  completely  observable  and  controllable,  then  the  parameters 
in  the  Kalman  filter  representation  (4),  (5),  (6)  are  identi- 
fiable (Kailath,  1968,  Appendix  2).  Our  estimation  problems  for 
the  system  therefore  revolve  around  this  model,  or  the  equivalent 
input-output  relationship  (8);  see,  in  particular.  Section  4 D. 


a 
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9. 

Before  attacking  the  optimal  design  problem  for  dynamic  systems, 
it  is  helpful  to  summarize  the  main  results  In  "static"  theory. 

Helpful  textbooks  on  dynamic  systems  include  Cadzo\,  (1973),  Astrom 
(1970),  Eykhoff  (1974),  and  Jazwinskii  (1970). 

3 . OPTIMAL  REGRESSION  DESIGN 

The  static  version  of  the  problem  concerns  optimal  design  for 
regression  models.  Although  O'Hagan  (1978)  uses  Bayesian  methods  to 
develop  quite  a general  approach,  the  usual  starting  point  Involves 
observations  of  a response  function  which  depends  on  k unknown 
parameters  6,  and  on  the  site  of  the  observation,  u.  The  point  •: 
is  chosen  from  some  compact  design  space  U . The  problem  is  to  decide 
how  to  distribute  the  available  observations  amongst  the  possible 
sites,  or  to  choose  an  optimal  design  measure,  which  specifies  the 
proportion  of  the  observations  to  be  made  at  the  different  points  In 
U.  The  latter  problem,  which  gives  "approximate"  designs,  is  theo- 
retically much  easier  than  the  practically-motivated  exact  theory. 

The  meaning  of  "optimal"  will  be  discussed  presently. 

The  basis  of  the  theory  (see  the  references  in  Section  1)  was 
developed  just  for  1 inear  regression  models  with  independent  errors, 
but  it  is  helpful  to  consider  a more  general  set-up.  Let  us  retain 
the  feature  of  Independent  errors,  but  assume  that  the  response  func- 
tion is  nonl inear,  in  e . 

Let  1(0, u)  denote  the  Fisher  information  matrix  corresponding 
to  an  observation  at  u.  Then,  if  a design  £ on  li  Is  used,  the 
jverage  per  observation  information  matrix  is 

M(e,O  = jl(0»uH(du). 

U 
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Let  i be  a real-valued  convex  decreasing  function  on  the  set 
of  k x k nonnegative  definite  symmetric  matrices,  that  is,  such 
that  $(A)  < 4(B)  if  the  matrix  A - B Is  nonnegative  definite. 

A ^-optimal  design  will  be  a design  C*  such  that  t(M(e,0) 
is  minimized  at  £*.  Traditional  choices  for  $ include 
(i)  <*(•)  s -log  det(  • ) (D-ootlmal  i ty) 

( 1 i ) *(•)  * tr(*)_1  (trace-optimal ity)  . 

Let  the  Frechet  directional  derivative  of  $ from  A in  the 
direction  of  B be  denoted  by  «(A,B),  and  let  - denote  the  class 
of  design  measures  on  U. 

Then  we  have  the  following: 


Theorem  1. 

For  any  e, 

(i)  M(9,c)  is  symmetric  and  nonnegative  definite, 

for  any  C e ~ ; 

(ii)  M ( e ) = {M(e,£):£c  ~ > is  convex  and  compact; 

(ill)  the  extreme  points  of  M ( 9 ) are  each  of  the  form 

1(0, u),  for  some  u;  further,  for  any  £ e ~, 

' 

there  exists  n e assigning  positive  weight 
to  at  most  7j-k(k  + 1)  + 1 points  in  U and  such 
that  M(e,c)  * M(e,n).  (Essentially  Caratheo- 
dory's  Theorem.) 

Theorem  2.  (cf.  Whittle,  1973;  White,  1973) 

For  any  e,  the  following  are  equivalent: 


If  ♦ is  differentiable  at  M(e,£*),  we  also  have  the  equi- 
valents : 

(111)  •(M(el**),  1(9, u))  > 0 for  all  u e (1; 

( 1 v ) *(M(e,c*),  I(e,u))  = 0 for  any  u weighted 

positively  In  t*,  that  Is,  for  any  u In  the 
support  of  e*. 

(Thus  Theorem  1 (111)  implies  that  a ^-optimal  design  can  be  achieved 
with  finite  support,  and  Theorem  2 (iv)  gives  a practical  check  for 
optimality  when  ♦ is  differentiable.) 

Algorithm  1. 

Suppose  ♦(•)  is  differentiable  and  that  (an)  is  a sequence 

of  numbers  such  that  0 < < 1 , a ->  0 as  n •,  and  l a « «•. 

n n « n 

• n 

Let  u„  be  the  u t U that  minimizes 
n 

*{M(e,5n).  I ( e ,n ) > . 


Then,  from  an  initial  e and  subject  to  certain  conditions, 
the  sequence  of  designs  generated  by 

£n  + l ' ' %>«„  * V<un> 

converges  to  a ♦-optimal  design.  e(un)  denotes  the  degenerate 
design  concentrated  on  u . 


Bound. 


If  ♦(•)  is  differentiable,  £ and  c*  is  an  optimal  de- 


sign, then 


♦ (M(e.O)  - 9 (M( e ,4*))  s - nnn  ♦{M(e,c),  I(e,u)> 
This  indicates  how  "close"  5 is  to  the  optimum. 
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There  are  many  versions  of  the  equivalence  theorem,  Theorem  2 , 
both  in  the  statistical  and  in  the  engineering  literature,  but  the 
proofs  are  almost  all  essentially  the  same  as  that  of  Whittle  (1973), 
which,  because  of  its  generality,  is  satisfyingly  simple  and  elegant. 
The  only  radically  different  approach  is  that  described  for  D-optlmal- 
ity  in  terms  of  Lagrangian  duality  theory  by  Sibson  (1972,  1974)  and 
Silvey  and  T^tterington  (1973). 

Algorithm  1 is  only  one  of  many  that  have  been  suggested  for  the 
computation  of  optimal  designs;  see  Fedorov  (1972),  Wu  and  Wynn  (1978), 
Wu  (1976),  St.  John  and  Draper  (1975)  and  Tltterington  (1977).  There 
is  a fundamental  snag  to  its  application,  and  that  Is  its  dependence 
on  e.  It  seems  that  we  have  to  know  the  true  value  of  8 In  order 
to  calculate  an  optimal  design  for  estimating  it!  In  the  special  case 
of  linear  regression  ("linear"  in  e)  it  Is  easy  to  see  that  the 
e-dependence  disappears,  so  that  an  optimal  design  can,  in  principle, 
be  computed  before  the  experiment  starts.  In  this  case  M(*)  is 
proportional  to  the  inverse  of  the  covariance  matrix  of  the  least- 
squares  estimator  of  e . In  the  nonlinear  case  there  are  three 
possible  general  approaches. 

(1)  Apply  Algorithm  1 with  a prior  estimate, 

eQ,  of  e,  and  generate  an  "off-line"  design 
as  for  the  linear  case. 

(ii)  Propose  some  weighting  function  W(*)  on  the 

parameter  space,  n . This  may  or  may  not  be  a 
formal  prior  density.  Then  construct  either 

MU)  = f M(e,OW(dd) 
n 
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or  a new  criterion 

♦WU)  ■ | *{M(e,s)}W(d9)  . 

In  both  cases  equivalence  theorems  can  be 
written  down  and  proved  on  the  usual  way;  see 
Mehra  (1974b),  and  Lauter  (1974). 

( 1 1 i ) Carry  out  a sequential  design  procedure,  or 

on-line  input  synthesis.  One  such  modification 
is  the  following: 

Algorithm  2. 

Suppose  $ is  differentiable  and  that  after  n observations 
have  been  made,  corresponding  to  a design  £n,  an  estimate  §n  Is 
available  for  e. 

Suppose  un  minimizes  ${M(e  ,cn),  I ( §n  , u ) > . 

Take  an  observation  at  un , set  n * n + 1,  update  the  design 
measure  and  repeat  the  procedure. 

Algorithms  of  this  type  have  been  considered  by  Fedorov  and 
Malyutov  (1972),  White  (1975),  and  Ford  (1976),  and  in  the  engineer- 
ing literature,  as  will  be  reported  later.  Chernoff  (1953)  discussed 
the  awkward  dependence  of  optimal  designs  on  e and  he  encouraged 
the  development  of  sequential  procedures  in  Chernoff  (1975). 

Convergence  of  these  algorithms  is  very  hard  to  prove.  It  would 
be  plausible  If  the  sequence  { 8 n > were  consistent,  but  this  itself 
is  difficult  because  of  the  complicated  statistical  properties  of,  say, 
the  sequence  of  maximum  likelihood  estimators  of  6;  see  White  (1975), 
Ford  (1976),  and  Goodwin  and  Payne  (1977,  p.  115).  Algorithm  2 does, 
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however,  seem  to  be  a workable  and  helpful  way  of  dealing  with  the 
problem  of  nonlinearity.  Admittedly,  it  Is  only  a "one-step-ahead" 
procedure,  but  more  sophisticated  methods  Incur  very  heavy  computa- 
tions. This  is  evident  in  the  Bayesian  approach  of  O'Hagan  (1978), 
and  is  one  feature  that  merits  further  research. 

This  concludes  a brief  summary  of  the  major  results  in  optimal 
experimental  design.  We  only  mention  the  important  topics  of  exact 
designs,  designs  for  special  purposes  such  as  model  choice,  and  the 
problems  that  arise  with  designs  with  singular  information  matrices, 
which  may  lead  to  lack  of  differentiability  of  $.  This  occurs  par- 
ticularly when  only  a subset  of  the  parameters  is  of  Interest. 

4.  EXTENSION  OF  OPTIMAL  DESIGN  THEORY  TO  INPUT  SIGNAL  DESIGN  IN 
OTFiTTlI  I C SYSTEMS.  ‘ 

In  this  section  we  illustrate  how  the  results  from  Section  3 
find  application  in  the  input-synthesis  problem  in  dynamic  system 
i denti f i cation  , as  the  engineers  would  describe  it.  A few  typical 
papers  are  summarized  and  reference  Is  made  to  other  similar  work. 

A.  Box  and  Jenkins  (1976,  Appendix  A 11.2) 

The  model  considered  here  is  a very  simple  Input-output 
relationship,  but  it  brings  out  several  important  points. 

y(t  + 1)  - a^y(t)  = bju(t)  + e ( t ) , 
where  all  variables  are  scalar,  lajj  < 1 and  { e ( t ) } Is  white 
Gaussian  noise.  (In  the  following,  the  engineering  terminology  will 
be  Introduced  more  and  more.)  The  parameters  a^  and  bj  are  to  be 
estimated. 

Box  and  Jenkins  consider  choosing  input  processes  to  maximize 
the  determinant  of  the  long-term  information  matrix,  subject  to  con- 

•J 
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re- 


straints on  the  input.  They  consider  constraints  of  the  form 

(i)  0^  fixed,  (ii)  a*  fixed,  (iii)  fixed. 

The  solutions  are  that  in  cases  (i)  and  ( i 1 ) first-order  autoregres- 
sions are  optimal,  whereas  In  case  (iii)  white  noise  Is  best.  Major 
problems  are  that  the  parameters  in  the  autoregressions  depend  on 
at  and  bj.  In  the  dynamic  case,  therefore,  linearity  of  the  model 
In  the  parameters  does  not  usually  guarantee  the  possibility  of 
off-line  design. 

Similar  problems  are  discussed  by  Ng  et  al.  (1977)  for  higher- 
order  autoregressions.  As  shown  by  Levin  (1960)  the  optimal  input 
can  be  computed  "off-line"  if  the  input-output  relationship  Is  a 
moving  average. 

B.  Zarrop,  Payne  and  Goodwin  (1975) 

A more  complicated  stationary  input-output  representation 
is  considered  in  this  paper: 

n n n 

z a.y(t-i)  = z b.u(t-i)  + z c.e(t-i)  . (10) 

1=0  1 i =0  ' i =0  ' 

Each  of  y(t),  u(t)  and  e(t)  is  considered  to  be  a vector 
(mul ti -Input-mul tl -output) , the  {e(t)>  are  taken  to  be  independent 
normal  (0,z),  aQ  and  cQ  are  identity  matrices  and  the  parameters 

of  interest, 9 , are  the  elements  of  (a1,...,an,  bQ bR,  cj  , . . . ,cn  ,E ) . 

From  the  log-likelihood  of  an  N-sample  the  Fisher  Information  matrix 
is  computed.  The  asymptotic  per  observation  information  matrix  is  ex- 
pressed in  terms  of  the  normalized  spectral  distribution  function, 

Fy , of  the  input  process  and  it  can  be  written  as 

M(e.Fu)  “ constant  + term  "linear"  in  dFu(*)  . 
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The  Input  spectral  density  takes  the  role  of  the  design  measure 
in  Section  3 and  theorems  exactly  analogous  to  Theorems  1 and  2 are 
available,  although  the  proof  of  the  latter  Is  not  given  In  this 
reference,  but  in  Goodwin  and  Payne  (1977)  and  Mehra  (1974b,  1976a). 

An  Important  modification  to  Theorem  1 is  the  definition  of  the  ex- 
treme points  of  the  design  space,  which  Is  now  the  range  of  frequen- 
cies ( -v ,w) . The  extreme  points  correspond  to  pure  sine-wave  Inputs 
and  Caratheodory ' s Theorem  implies  that,  if  e is  k -dimensional  , an 
optimal  input  process  can  be  constructed  as  a linear  combination  of 
at  most  j k(k  ♦ 1)  sinusoids.  This  frequency-domain  approach 
has  a startling  advantage  in  that  the  design-space  is  a finite  inter- 
val, although  again  the  specific  optimal  frequencies  and  amplitudes 
are  e-dependent.  Mehra  (1976a)  suggests  substituting  a prior  esti- 
mate eQ,  and  proposes  a version  of  Algorithm  1 for  computing  the 
optimal  design  on  (-*,*). 

Other  papers  related  directly  to  this  problem  are  Payne  et  al. 
(1975)  and  Viort  (1972).  The  latter  work  seems  to  have  been  the  first 
attempt  to  investigate  D-optimality  in  dynamic  systems.  In  general, 
all  these  papers  concentrate  on  D-optimality,  although  the  basic 
theorems  have  much  wider  validity,  as  in  Section  3. 

C.  Keviczky  (1975) 

Keviczky  considers  a scalar  (single-input-single  output)  ver- 
sion of  (10),  specifically  of  the  form 

n m r 

E a.y(t-l)  ■ e b.u(t-i)  + \ E c.e(t-i) 
i =0  1 i =0  1 1*0  1 

where  a = c * 1,  m < n,  r s n.  The  errors  are  identically  dis- 
tributed and  Gaussian,  with  zero  means.  He  considers  separately  the 
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case  of  { e ( t ) > uncorrelated  and  correlated  and  he  works  with  fin- 
itely many  observations  and  therefore  In  the  time-domain.  As  In  B, 
he  constructs  the  Fisher  Information  matrix,  regarding  as  the  k 

parameters  0^  * (bQ bm,  8j,...,an),  and,  for  the  uncorrelated 

errors  case,  derives  a recursion  for  the  determinant  of  the  covar- 
iance matrix  of  the  least-squares  estimators  of  the  parameters,  from 
the  N-observatlon  case  to  that  of  N + 1,  Using  this,  he  is  able  to 
choose  u(N  + 1)  to  maximize  the  increase  In  the  D-optimallty  cri- 
terion subject  to  an  amplitude  constraint 

-U  < u(N  + 1)  < U . 

As  often  happens,  it  is  optimal  to  take  |u(N  + 1)|  = U. 

When  the  errors  are  correlated,  optimal  design  has  to  be  based 
on  the  Information  matrix,  which  involves  the  usual  difficulty  of 
i gnorance  about  e . 

A summary  of  this  approach  is  given  by  Goodwin  and  Payne  (1977), 
and  recursive  design  is  also  described  by  Arimoto  and  Kimura  (1971), 
using  a Bayesian  Information-theoretic  viewpoint. 


D.  Mehra  (1974b) 

This  paper  reviews  the  field  quite  fully  and  discusses  ex- 
plicitly state-space  models  like  (3).  Controllabil ity  and  observa- 
bility are  assumed  and  the  identifiable  form  of  the  system  given  by 
(4)  - (6)  is  considered.  As  in  B and  C,  the  log-likelihood  from 
N observations  Is  written  down  as 

L(e)  = constant  - Z jvT(t)v(t)  + 2|l(t)|| 

where  e denotes  the  set  of  k parameters.  The  Fisher  information 
matrix  related  to,  say,  a scalar  input  sequence  uj  * (u( 1) , . . . ,u(N)) 
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can  be  evaluated,  after  some  effort  (details  In  Mehra,  1976b),  and 
it  turns  out  that 

M(e,e)  « f W(e,s(uN))5(duN)  + A(e)  * W(e,0  + A(e) 

UN 

where  i is  a measure  on  the  space  of  all  possible  uN»  and 

each  element  of  W(e,uN)  is  a quadratic  form  in  u^.  Typical  Input 
constraints,  defining  U^,  are 

(a)  uJuN  < 1 (energy  constraint) 

(b)  |u(t) | < 1 , t = 1 N (amp! itude  constraints)  . 

Again,  D-optimality  is  considered  and  direct  analogues  of 

Theorems  1 and  2,  Algorithm  1 and  the  Bound  (Mehra,  1976a)  are  pro- 
vided. 

In  this  special  case, 

♦ {M(0,O  ,I(9,un) ) * tr{M"l(e,OW(e,c)>  - tr{M'1(e,t)W(e,c(uN)) > , 

and  the  equivalence  theorem  is  given  in  terms  of  this.  The  formula- 
tion given  here  follows  Mehra  (1976a,  pp. 230-249)  more  closely.  In 
Mehra  (1974b)  a prior  distribution  Is  assumed  for  e and  results 
given  in  terms  of 

M(0  - E0M(e,O  . 

Implementation  of  Algorithm  1 involves,  at  stage  n,  the  choice 
of  a U|$n)  e U"  to  maximize 

tr{M"1(e,cn)w(e,e(uK))}  , (11) 

and  this  Itself  Is  a complex  procedure.  Criterion  (11)  is  quadratic 
in  U|y»  so  In  the  bounded-energy  case  (a)  we  have  an  eigenfunction 
problem  and  In  the  linearly-constrained  bounded-amplitude  case  (b)  a 
quadratic  programming  problem,  which  results  in  a bang-bang  input; 


! 
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that  Is,  | u ( t ) | = 1,  for  each  t. 

Even  when  an  optimal  design  on  has  been  computed,  it  is  not 

clear  how  to  apply  it,  being  a probability  measure  on  the  set  of 
N-stage  inputs.  Hehra  (1974b)  suggests  some  "concatenation"  in  which 
parts  of  the  positively  weighted  {u^}  (those  In  the  support  of  the 
design)  are  applied  In  turn. 

In  the  stationary  case,  Mehra  considers  computation  of  long-term 
optimal  systems  by  frequency-domain  analysis  as  in  B above,  point- 
ing out  the  computational  attractiveness  related  to  the  simple  design 
space  (-*,*).  This  approach  is  made  possible  by  the  possibility  of 
representing  the  equilibrium  model  in  input-output  terms  as  in  (8)  in 
Section  2. 

Mehra  (1974b)  indicates  generalizations  to  continuous-time,  non- 
linear and  distributed  parameter  systems  and  gives  some  continuous- 
time illustrative  examples.  The  only  complication  in  the  continuous- 
time frequency-domain  approach  is  that  the  design  space  of  frequencies 
is  (-«,«). 


E.  Aoki  and  Staley  (1970) 

Although  this  paper  is  more  tenuously  dependent  on  the  work 

of  Section  3,  it  is  appropriate  to  mention  it  now  because  it  uses  an 

optimality  criterion  related  to  B above.  The  authors  consider  the 

autoregressive  model 
k 

£ a y(t-i)  - u(t)  + e(t)  , t = 1,2,... 

1*0  1 

where  the  {e ( t ) > are  Independent  N(0,o2),  aQ * 1 and  suitable  Initial 
conditions  are  specified.  N observations  have  to  be  made  to  estimate 
* (a^,. ..,a|()  and  the  criterion  used  is 


where  t is  a design  on  liN  and  M Is  the  Fisher  information  matrix. 

For  this  criterion,  we  can  find  an  optimal  design  that  Is  degen- 
erate, concentrated  on  one  point  In  the  design  space,  that  is,  on  one 
set  of  N Inputs.  In  the  bounded  energy  case,  therefore,  we  must 
maximize 

tr(l(e,uN))  , 

which  turns  out  to  be  quadratic  in  uN,  subject  to  the  quadratic  con- 
straint uju^  < 1.  As  in  the  treatment  of  (11)  in  D above,  we  must 
solve  an  el genprobl em. 

The  continuous-time  version  of  this  problem  expressed  in  state- 
space  terms  has  been  examined  by  Mehra  (1974a).  Instead  of  N dis- 
crete inputs,  observations  are  made  over  a finite  period  (0,T).  The 
trace  criterion  (12)  is  used,  and  the  energy  constraint  is 

| u(t)^u(t)dt  s 1 . 

0 

The  optimal  {u(t):  0 < t s T}  can  again  be  regarded  as  an 
eigenfunction.  When  there  Is  a scalar  parameter  the  problem  is  of 
Sturm-Llo uvll  1 e type'and  the  equation  satisfied  by  the  optimal  Input 
can  be  regarded  as  a Fredholm  integral  equation.  Various  methods  are 
suggested  for  obtaining  explicit  solutions. 

The  trace  criterion  (12)  (not.  the  same  as  the  usual  one  of 
tr(-)"1)  was  also  used  by  Nahi  and  Napjus  (1971)  and  Lopez-Toledo 
(1974).  Its  use  was  criticized  by  Zarrop  and  Goodwin  (1975),  with 
rejoinder  by  Mehra  (1975),  on  the  grounds  that  the  Information  matrices 
:orrespondi ng  to  the  optimal  designs  are  often  singular,  so  that 
i dentl fl abi 1 i ty  problems  may  well  arise.  The  corresponding  trivial 
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observation  in  the  static  design  case  is  implicit  in  Sllvey  and 
Titterington  (1974). 

F.  Dogorovcev  (1971),  and  Spruill  and  Studden  (1978) 

In  these  papers  the  influence  of  time  is  felt  in  a slightly 
different  way.  In  Dogorovcev  the  response  function  is 


y(t)  = e u(t)  + e(t)  , 0 < t < T (13) 

where  0 and  u(t),  for  each  t,  are  k-vectors  and  { e ( t ) } is  a 

stationary  process  with  zero  means  and  covariance  kernel  R(s,t), 

defining  a reproducing  kernel  Hilbert  space  (RKHS)  H(R),  in  the 

sense  of  Parzen  (1961).  e is  estimated  by  e,  the  best  estimator 

linear  In  (y(t):  0 s t < 1}  and  optimal  functions 

{ u ( t ) : 0 < t $ 1}  are  sought  for  trace  optimality  of  cov(§)  and 

T * 

to  minimize  the  variance  of  c 0,  for  a specified  vector  c.  An 
orthogonal  basis  can  be  set  up  in  H(R)  and  approximately  normalized 
members  of  this  basis  provide  the  optimal  input  functions. 

Spruill  and  Studden  (1978)  extend  the  work  to  more  complex 
responses,  dependent  on  a spatial  variable  as  well  as  t. 

G.  Miscellaneous  papers 

In  Mehra  (1974a)  it  was  not  necessary  to  choose  a design  on 
the  class  of  input  strategies  because  a degenerate  design  was  opti- 
mal. Such  a search  for  an  optimal  input  process  leads  to  a more  con- 
ventional numerical  problem.  The  choice  of  inputs  over  the  interval 
(0,T),  subject  to  some  integral  constraint,  to  minimize  some  time- 
integrated  functional  is  a familiar  extremal  problem  in  control 
engineering,  leading  to  solution  by  variational  methods  or  by  the 
theory  of  eigenfunctions.  Such  methods  constitute  what  is  sometimes 
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called  the  optimal  control  approach. 

Mehra  (1974a)  falls  into  this  category,  as  does  his  precursor 
Levadi  (1966).  This  author  considered  the  conti nuous- ti me  scalar 
moving-average  process 


y(t) 


| b(t,i)u(x)dT  + e(t)  , 


0 < t < T » 


where  the  function  b ( * , • ) contains  k parameters,  0.  An  Input 
process  { u ( t ) : 0 < t s T)  has  to  be  chosen,  subject  to 

| u2(t)dt  = 1,  to  minimize  tr  I~*(e,u(*))»  where  l’*(*,*)  denotes 

the  covariance  matrix  of  the  best  linear  estimator  of  e.  The  noise 
may  be  correlated.  In  this  moving-average  example  the  optimal  input 
is  independent  of  e,  unlike  most  of  the  cases  we  have  considered, 
and  the  solution,  which,  like  Dogorovcev  (1971),  is  based  on  the 
RKHS  formulation  of  Parzen  (1961),  is  again  an  eigenfunction  satis- 
fying a Fredholm  integral  equation. 

In  Goodwin  (1971)  a discrete-time  nonlinear  state-space  system, 
involving  k parameters  e,  coupled  with  a linear  observation  equa- 
tion, is  linearized.  The  performance-index  to  be  minimized  is  the 
sum  of  the  more  usual  trace  criterion  (see  Section  3)  and  a penalty 
function  to  restrict  the  input  choice.  Numerical  solution  Is  nec- 
essary; see  also  Nahi  and  Wallis  (1971). 

The  trace  criterion  has  been  used  in  single-input-single-output 
input-output  models  by  Goodwin,  et  al.  (1973),  Goodwin  and  Payne 
(1973).  Hamiltonian  methods  were  used  to  compute  optimal  inputs 
which  compared  well  with  suboptimal  strategies. 

Other  approaches  have  considered  specific  types  of  Input  process 
and  tried  to  optimize  within  the  appropriate  class;  see  Van  der  Bos 
(1967,  1973),  Litman  and  Huggins  (1963). 
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5.  OPTIMAL  CHOICE  OF  SAMPLING  INSTANTS  OR  LOCATIONS 

Section  4 concentrated  on  the  problem  of  choosing  optimal  input 
processes.  Here  we  consider  how,  given  the  input  processes  for  a 
conti nuous -ti me  system  that  can  be  sampled  at  discrete  instants,  the 
sampling  strategy  should  be  designed.  Three  models  are  considered 
below,  corresponding  to  three  groups  of  authors. 

A . Sacks,  Ylvisaker  and  Wahba 

These  authors  considered  the  response  models  similar  to  (13) 
in  Section  4F.  Sacks  and  Ylvisaker  (1966)  looked  at  the  scalar 
parameter  case,  with 

y ( t ) = eu(t)  + e(t)  , 0 < t < 1 , 

and  they  considered  how  to  choose  N distinct  sampling  times  so  as 
to  minimize  the  variance  of  the  best  linear  estimator  of  e.  In 
particular,  they  looked  at  what  happened  as  N » and  found  that 
asymptotically  optimal  solutions  existed,  provided  the  error  covar- 
iance kernel  R(s,t)  was  non-smooth  on  s = t and  provided  u(t) 
belonged  to  H(R).  The  optimal  solutions  were  characterized  but 
their  explicit  computation  is  difficult.  In  later  papers  (1968,  1970), 
they  extended  their  work  to  the  k-parameter  case  and  relaxed  the 
aforementioned  condition  on  R(s,t). 

Wahba  (1971)  related  the  problem  to  one  of  function  approxima- 
tion by  splines  and  did  more  work  on  the  computation  of  optimal  se- 
quences of  designs. 

B.  Goodwin,  Hehra  and  others 

In  a series  of  papers  starting  with  Goodwin,  et  al.  (1974), 


* 


linear  state-space  models  of  the  following  form  were  considered. 


■V****'- 


d x_( t ) ■ G^(t)dt  + Hu(t)dt  + Fde(t) 
y(t)  = Bx(t)  + Cu( t ) , 
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(14) 
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with  some  initial  conditions,  where  e,  the  parameter  vector,  is 
made  up  of  the  unknowns  In  (G,  H,  B,  C)  and  where  { e ( t ) } is  a 
(multivariate)  process  with  independent  increments.  Sampling  times 
for  N observations  have  to  be  chosen,  which  leads  to  the  replace- 
ment of  (14)  by  an  appropriately  integrated  discrete  recursion.  The 
Fisher  information  matrix  is  written  down  and  the  D-optimality  and 
trace-optimality  criteria  are  considered.  In  Goodwin,  et  al.  (1974) 
and  Payne,  et  al.  (1975),  where  for  N large  the  frequency-domain 
approach  is  used,  the  improvement  resulting  from  non-uniform  sampling 
Intervals  In  a simple  example  and  the  simultaneous  choice  of  sampling 
frequency  and  input  function,  are  discussed.  The  work  Is  summarized 
in  Goodwin  and  Payne  (1976,  1977,  Section  6.5). 

Mehra  (1976b)  considers  a similar  model  to  (14),  with  observation 
equati on 

y(t)  = Bx.(t)  + n(t),  0 < t < 1 

where  n(t)  is  Gaussian  white  noise.  He  considers  the  choice  of 
measurement  times  not  now  from  a parameter  estimation  point  of  view 
but  from  one  of  state  estimation.  Instead  of  the  Fisher  Information 
matrix  he  considers  P(t),  the  covariance  matrix  of  the  state  vector, 
whose  evolution  is  governed  by  the  Kalman  equations.  In  particular, 
he  would  like  to  "minimize"  P ( 1 ) . However,  P(l)  Is  hard  to  compute 
explicitly  in  the  continuous-time  model,  as  a solution  of  a Rlccatl 
equation  and  he  opts  for  approximations  that  lead  back  to  considera- 
tion of  criteria  based  on  the  Fisher  information  matrix.  The  paper 
closely  parallels  Mehra  (1974b),  with  results  like  Theorems  1 and  2 
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and  a computational  procedure  on  the  lines  of  Algorithm  1. 

Ng  and  Goodwin  (1976)  consider  the  model 
dx(t)  3 Gx(t)dt  + Fde(t) 
y(t)  3 Bx(t)  , 

where  e(t)  is  white  Gaussian  noise. 

Using  (0,-)  as  the  design  space,  they  represent  a sampling 
strategy  as  a design  t on  (0,~)  with,  as  extreme  points,  the 
uniform  sampling  rates.  The  appropriate  Fisher  information  matrix 
satisfies  Theorem  1,  so  that  in  a k-parameter  problem  an  optimal 
strategy  can  be  achieved  using  at  most  k(k  + 1)  sampling 
rates . 

As  usual,  explicit  results  depend  on  knowledge  of  6 . Methods  of 
coping  with  Ignorance  about  e and  of  frequency-domain  analysis 
are  described.  They  show  that  it  is  better  to  concatenate  subexperi- 
ments using  pure  sinusoids  than  to  carry  out  a single  experiment  with 
a mul ti -frequency  input. 

C . Seinfeld  and  others 

These  authors  are  involved  with  such  spatial  problems  as 
the  measurement  of  pollution  and  with  the  location  of  monitoring 
stations.  They  are  therefore  obliged  to  look  at  distributed-parameter 
systems  and  to  construct  spatial  designs  that  are  both  exact  (a  gen- 
eral design  measure  will  not  do)  and  non-rep! icating. 

With  the  pollution  levels  as  state  variables,  the  Kalman  filter 
equations  are  constructed  and  a criterion  of  "total  integrated  var- 
iance" (integrated  over  time  and  space)  is  computed,  using  the 
Riccati  equation  for  the  state  covariance  matrix  P(t,u).  Heavy 
numerical  work  is  necessary  for  optimal  choice  of,  say,  N locations. 
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In  Yu  and  Seinfeld  (1973)  sequential  choice  is  suggested;  see  also 
Chen  and  Seinfeld  (1975),  Seinfeld  (1976). 

6.  CLOSING  REMARKS 

The  problem  of  optimal  experimental  design  is  clearly  important 
in  models  where  time  is  an  inevitable  component.  It  Is  pleasing  that 
the  Kiefer-Wol fowi tz  work  in  static  models  carries  over  automatically 
to  many  dynamic  problems  in  which  the  underlying  design  space  can  be 
a class  of  input  processes  (time-domain  analysis),  a range  of  fre- 
quencies (frequency-domain)  or  a range  of  sampling  rates  (sampling 
strategies) . 

Although  the  published  theory  does  not  go  far  beyond  D-optlmal- 
ity  or  trace-optimality,  general  criteria  could  be  considered,  as  in 
Section  3.  There  is  not  much  written  about  0$ -optimal i ty  and  its 
counterparts,  in  which  only  s of  the  k parameters  are  of  interest, 
although  again  this  would  not  involve  any  extra  fundamental  ideas. 
Goodwin  and  Payne  (1977)  mention  it  in  the  context  of  model  discrimin- 
ation, drawing  analogy  with  work  of  Atkinson  and  Cox  (1974)  and 
Atkinson  and  Fedorov  (1975a,  1975b).  The  latter  two  papers  bear  some 
resemblance  to  a paper  by  Gagliardi  (1967),  who  considers  a special 
model  choice  problem  Involving  a finite  parameter  space.  His  methods 
are  generalized  by  Kuszta  and  Slnha  (1976,  1977,  1978). 

These  are  some  areas  where  further  development  Is  desirable. 
Another  is  the  consideration  of  more  examples,  in  particular  less 
simple  ones  than  at  present  reported,  altnough  they  are  likely  to  in- 
volve hard  computational  work  because  of  the  almost  inevitable  depen- 
dence of  the  Information  matrix  on  the  parameter  values.  Any  advances 
in  the  numerical  aspects  cf  these  problems  would  be  very  valuable. 
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